Method and apparatus for dynamically adjusting resources assigned to plurality of customers, for meeting service level agreements (SLAs) with minimal resources, and allowing common pools of resources to be used across plural customers on a demand basis

ABSTRACT

A method (and system) for managing and controlling allocation and de-allocation of resources based on a guaranteed amount of resource and additional resources based on a best effort for a plurality of customers, includes dynamically allocating server resources for a plurality of customers, such that the resources received by a customer are dynamically controlled and the customer receives a guaranteed minimum amount of resources as specified under a service level agreement (SLA).

The present Application is a Continuing Application of U.S. patentapplication Ser. No. 09/559,065, filed on Apr. 28, 2000, now U.S. Pat.No. 7,054,943.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to a world-wide network, andmore particularly to sites of a plurality of Internet World Wide Web(WWW) sites of various owners hosted by a service provider using a groupof servers and meeting with agreed-upon service levels.

2. Description of the Related Art

The Internet is the world's largest network, and it has become essentialto businesses as well as to consumers. Many businesses have startedout-sourcing their e-business and e-commerce Web sites to serviceproviders, instead of operating their Web sites on their own server(s)and managing them by themselves. Such a service provider must install acollection of servers in a farm called a “Web Server Farm (WSF)”, or a“Universal Server Farm (USF)” which can be used by many differentbusinesses to run their e-commerce and e-business applications. Thesebusiness customers (e.g., the service provider's “customers”) havedifferent “server resource” requirements for their Web sites andapplications.

When businesses (hereafter referred to as “customers” or “customers of aserver farm”) out-source their e-commerce and/or e-business to a serviceprovider, they must obtain some guarantee on the services they aregetting (and will continue to obtain) from the service provider fortheir sites. Once the service provider has made a commitment to acustomer to provide a certain “level” of service (e.g., referred to as a“Service Level Agreement (SLA)”), the provider must guarantee that levelof service to that customer.

FIG. 1 illustrates an abstracted view of a conventional server farm. Aserver farm 103 includes multiple servers which host customerapplications, and is connected to Internet 101 via communicationslink(s) 102. Each customer's server resource requirements changes sincethe demands to customers' applications change continuously on a dynamicbasis during each day of operations.

However, a problem with the conventional system and method used therebyis that, hitherto the present invention, there has been no provision fordynamically equipping the server farm such that server(s) and theirresources can be dynamically allocated. Hence, there has been noflexibility in dynamically allocating servers and their resources tocustomers as the customer's demands change. This results in system-wideinefficiency and general dissatisfaction by the customer.

Another problem with the conventional system is that there are noService Level Agreements (SLAs) based on dynamic allocation andde-allocation of servers to customer's server clusters.

Yet another problem with the conventional system is that there is noprovisioning of SLAs in support of both a guaranteed number of serversand optional additional servers based on the workload changes tocustomers' applications. Yet another problem with the conventionalsystem is that a “hacker” or “hackers” can generate a large amount ofworkload to a customer's sites or to the server farm itself to “crash”servers or server farm.

SUMMARY OF THE INVENTION

In view of the foregoing and other problems of the conventional methodsand structures, an object of the present invention is to provide amethod and structure in which an allocation of server resources for aplurality of customers is dynamically controlled.

Another object of the present invention is to support the (minimum,maximum) server resource-based service level agreements for a pluralityof customers.

Yet another object of the present invention is to control the allocationof additional server resources to a plurality of customers using thebounds on given service level metrics.

Still another object of the present invention is to support variousservice level metrics.

A further object of the present invention is to support the use ofdifferent metrics for different customers.

Another object of the present invention is to use a service levelmetric, the amount of allocated resources, and the inbound traffic rate,for defining the state of the current service level (M,N,R) for eachcustomer.

Another object of the present invention is to use a “target” servicelevel metric Mt to keep the actual service level M close to the targetservice level.

A further object of the present invention is to compute a “target”amount of resources Nt and the inbound traffic rate Rt from a given Mtand (M,N,R).

Still another object of the present invention is to provide and useformulas for computing Nt and Rt from Mt and (M,N,R).

A still further object of the present invention is to allow the use ofnumerical analysis or quick simulation techniques for deriving Nt and Rtin place of using formulas invented and described in this patentapplication.

Yet another object of the present invention is to support resourceutilization U for M, average response time T for an actual service levelM, and the response time percentile T % for the actual service level M(and therefore, the support of targets Ut, Tt and Tt %).

Another object of the present invention is to provide a method (decisionalgorithm) for deciding whether or not to add additional serverresource(s) or to reduce (“throttle down”) the inbound traffic to meetthe service level agreements for a plurality of customers.

In a first aspect of the present invention, a method (and system) formanaging and controlling allocation and de-allocation of resources basedon a guaranteed amount of resource and additional resources based on abest effort for a plurality of customers, includes dynamicallyallocating server resources for a plurality of customers, such that theresources received by a customer are dynamically controlled and thecustomer receives a minimum (e.g., a minimum that is guaranteed) amountof resources as specified under a service level agreement (SLA).

In another aspect, a program storage device is provided for storing theprogram of the inventive method.

With the unique and unobvious features of the present invention, aserver farm is equipped with a means to dynamically allocate servers (orserver resources) to customers as demands change.

It is noted that a general service level agreement (SLA) on a serverresource for a customer can be denoted by (Smin#(i), Smax#(i),Mbounds(i)), where Smin#(i) denotes the guaranteed minimum amount ofserver resources (e.g., the number of servers), Smax(i) denotes theupper bound on the amount of server resources that a customer may wantto obtain when free resources are available, and Mbounds(i) gives twobounds: Mhighbound(i) and Mlowbound(i) on a service level metric M thatis used in controlling the allocation of resources beyond the minimumfor each i-th customer. Mhighbound(i) is used to decide when to addadditional server resources and Mlowbound (i) is used to decide when toremove some server resources.

The minimum (or min) amount of server resources (e.g., number ofservers) Smin#(i) is a guaranteed amount of server resources that thei-th customer will receive regardless of the server resource usage. Themaximum (or max) amount of server resources Smax#(i) is the upper boundon the amount of server resources that the i-th customer may receivebeyond the minimum provided that some unused server resources areavailable for allocation.

Therefore, the range between Smin#(i) and Smax#(i) represents serverresources that are provided on an “as-available” or “best-effort” basis,and it is not necessarily guaranteed that the customer will obtain theseresources at any one time, if at all. The allocation of additionalresource(s) is performed so as to keep the performance metric withinMbounds(i).

Examples of Mbounds(i) include: (1) the bound on the server resourceutilization that is denoted by Ubounds(i); (2) the bound on the averageserver response time that is denoted by Tbounds(i); and (3) the bound onthe server response time percentile that is denoted by T%bounds(i).

Table 1 provides definitions and notations used throughout the presentapplication. For example, whenMbounds(i)=Ubounds(i)=(Ulowbound(i),Uhighbound(i)=(50%, 80%), the serverfarm tries to allocate additional server resources (or de-allocate someservers) to the i-th customer's server complex to keep the serverresource utilization between 50% and 80%.

That is, when the server resource utilization goes above 80%, the serverfarm tries to keep the utilization below 80% by allocating additionalserver resources to the i-th customer when free resources are available.If free resources are not available, the server farm may need to limitthe amount of incoming traffic to the i-th customer's server complex.Conversely, when the server resource utilization goes below 50%, theserver farm tries to remove some server resources from the i-th customerin order to keep the utilization above 50%. In order to keep theobserved metric M within the given Mbounds, the notion of a “target”metric Mt is introduced. Mt is a value that falls between Mlowbound andMhighbound and the system of the present invention tries to keep theobserved metric M as close as possible to the target metric Mt byadjusting server resources. In general, the unit cost of the serverresources above the minimum guarantee is more than or equal to that ofthe server resources below the minimum.

Thus, the present invention provides a dynamic resource allocation to aplurality of customers to meet with the (min, max) server resources andperformance metric based service level agreements. Unused (un-allocated)server resources are pooled and allocated and de-allocated from thepool, thus providing sharing of server resources among plurality ofcustomer, leading to efficient use of server resources. Since incomingworkload is regulated when it has exceeded server resources allocated,the system provides a “denial of services” to some workloads, thuspreventing a crash of hosted customer sites and preventing a crash ofthe server farm itself.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other purposes, aspects and advantages will be betterunderstood from the following detailed description of a preferredembodiment of the invention with reference to the drawings, in which:

FIG. 1 illustrates an abstracted view of a conventional server farm;

FIG. 2 illustrates a general overview of the operation and structure ofthe present invention;

FIG. 3 illustrates a concept of a Service Level Agreement (Smin#, Smax#,Mbounds);

FIG. 4 illustrates a graph showing the relationship of Metric M to thenumber of server resources, to show a concept of the present invention;

FIG. 5 illustrates an overall system 500 and environment of the presentinvention; and

FIG. 6 illustrates a decision method 600 for server allocation.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS OF THE INVENTION

Referring now to the drawings, and more particularly to FIGS. 1-6, thereis shown a preferred embodiment of the method and structure according tothe present invention.

Preferred Embodiment

Referring to FIG. 2, prior to describing the details of the invention,an overview and a primary object of the present invention will bedescribed below.

As shown in FIG. 2, the invention first monitors the inbound trafficrate R(i) 206, the currently assigned amount of server resources N(i)205, and the current service level metric M(i) 204 for all customers 201and 202.

Then, the inventive system performs the following actions only when M(i)falls outside of Mbounds(i), namely either M(i) is above Mhighbound(i)or M(i) is below Mlowbound(i), to avoid “allocation/de-allocationswings”.

The “target” amount of server resources Nt(i), without changing theinbound traffic R(i), is computed. Further, the “target” inbound trafficrate Rt(i), without changing the allocated resource N(i), is computed inorder to bring the service level metric M(i) close to the “targeted”service level metric Mt(i) from monitored R(i), N(i) and M(i) for all i.The target service level metric Mt(i) is the service level metric at ornear which one wants to keep M(i) so that M(i) falls withinMbounds(i)=(Mlowbound(i),Mhighbound(i)).

Once Nt(i) and Rt(i) are computed, then it is decided how to movecurrent M(i) to the target Mt(i), by either changing N(i) to Nt(i)(e.g., this involves either allocating server resources from freeresource pool 203 to a customer's server set 201 or 202, or taking someserver resources away from customer 201 or 202 and return to the pool203) or by bounding the inbound traffic rate R(i) to Rt(i) (e.g., thisis performed when either the maximum amount of resources has beenalready allocated or no free resource is available so that the only wayto bring M(i) to Mt(i) is to reduce the amount of inbound traffic).

Once the decision has been made, it will then send a request to anappropriate systems resource manager (e.g., a “resource allocationmanager” or an “inbound traffic controller”).

FIG. 3 illustrates the concept of the service level agreement (SLA) thatthe present invention supports for a plurality of customers. The servicelevel agreement for each customer has the form of (Smin#, Smax#,Mbounds), where Smin# is the guaranteed amount of server resources(e.g., the number of servers), Smax# is the upper bound on the totalamount of server resources that a customer may obtain when freeresources are available, and Mbounds is a pair of bounds on the servicelevel metric that are used in determining when to add additionalresources or to remove some resources away. For ease of illustration, inFIG. 3, the server resource is assumed to have (reside in) a singledimension. However, this could be a vector.

FIG. 3 shows six operation spaces: A 301, B 302, C 303, D 304, E 305 andF 306. Because of the bounds Smin# 314, and Smax# 313, the feasibleoperation spaces are B 302 and E 305.

It is noted that the operation space D 304 could be made availableespecially when a server farm operator could “borrow” some servers fromsome customers when the customers are not fully utilizing theirresources.

The operation space B 302 is a “non-desirable” space since the servicelevel metric M is exceeding the bound Mhighbound 311. The operationspace E 305 is the space in which the operational state should be kept.Furthermore, the upper portion of the space E 305 that is bounded byMlowbound 312 and Mhighbound 311 is the operation space allowed by theexemplary service level agreement (SLA) that the present inventionsupports. It is noted that the metric M may be utilization, averageresponse time, percentile response time, etc. Mbounds 307 may beUbounds, Tbounds, T%bounds, etc. as suitably determined by the designergiven constraints and requirements imposed thereon.

FIG. 4 illustrates a primary concept of the present invention. Here, theoperation space 305 is divided into two regions. A first region iscalled a “green belt” 405 (e.g., the region bounded by Mlowbound 312 andMhighbound 311), and a second region is the remaining space of the space305.

In the present invention, the operation state which falls into the greenbelt 405 is deemed to be acceptable while the operation which fallsoutside of the green belt (e.g., below the green belt), is notacceptable since too many unnecessary resources are allocated, therebyincurring extra (wasteful) costs to a customer.

FIG. 4 illustrates the target service level metric Mt 401 with respectto the service level metric bound Mbounds 307 and the green belt 405. Mt401 is the target value that falls within the green belt 405. The upperbound on the green belt 405 is Mhighbound 311 and the lower bound isMlowbound 312. The green belt 405 is also bounded by Smin# 314 and Smax#313. Thus, the green belt 405 is a representation of an SLA of the form(Smin#, Smax#, Mbounds).

An object of the dynamic resource allocation according to the presentinvention is to keep the operation state within the green belt 405. Whenthe current operation state that is denoted by (M,N,R) is at 403 in thespace 305, the primary operation is to reduce the currently allocatedamount of resources N to the target amount Nt, so that the service levelmetric M at 403 would move to the target metric Mt at 404.

When the current operation state that is denoted by (M,N,R) is at 402 inthe space 302, the current resource N may be increased to Nt when somefree resources are available for allocation, or the inbound traffic Rmay be reduced to Rt so that metric M at 402 would move to Mt at 404.When the current state is within the green belt 405, no action is taken.The green belt 405 therefore defines the allowable system operationstate region such that any state within the green belt 405 meets theservice level agreement (SLA).

FIG. 5 illustrates an overall system 500 according to the presentinvention including a main system 501, an inbound traffic controller506, and a server resource manager 509.

The main system 501 includes a decision module and methodology 503(e.g., algorithm), a module 502 (algorithm) for computing targets Nt(i)and Rt(i), and a repository for storing Service Level Agreements (SLA)504.

The module 502 computes the target values Nt(i) and Rt(i) from themonitored data M(i) 204, N(i) 205 and R(i) 206 for every customerwhenever its operation state (M(i),N(i),R(i)) falls outside of the greenbelt 405 associated with the customer.

Then the decision module 503, using the SLA information,(M(i),N(i),R(i)), Nt(i) and Rt(i), decides what action to take.

That is, the decision module 503 decides either to change the currentresource amount from N(i) to Nt(i) 508, or bound the current inboundtraffic rate R(i) by Rt(i) 505, and then take appropriate action.

System 501 has a communications means to instruct “server resourcemanager” 509 to change resource allocation 510. The system 501 has acommunications means to instruct “inbound traffic controller” 506 tobound the incoming traffic 507 to a specific customer site (201 or 202).

Tables 2 through 5 give various means in computing or deriving targetvalues Nt(i) and Rt(i) for every customer i.

For example, Table 2 describes formulas for computing these targets whenthe service level metric M is the resource utilization U.

Table 3 describes a formula for computing these targets when the servicelevel metric M is the average response time T. Here, the averageresponse time was derived from the “M/M/m” multi-server queuing model.

It is noted that since the computation is used for the “hill climbing”optimization and is repeated periodically, and the amount of resourcesallocated or de-allocated at each step is assumed to be very smallcompared to the amount of resources currently allocated, the use of“M/M/m” model should be quite acceptable even though the arrival ratemight be different from Poisson and the job processing time may not beexponentially distributed. A major advantage of “M/M/m” model is that itoffers the closed form formula as shown in Table 3.

Table 4 describes formulas for computing these targets when the servicelevel metric M is the response time percentile T %. Again, the “M/M/m”queuing model is assumed in computing the targets.

Table 5 shows that, instead of using a formula to compute the targets(Nt,Rt), one could use any numerical computation tool or quicksimulation tool.

FIG. 6 describes the decision method 600 employed by module (algorithm)503 for server resource allocation in the system 501.

The decision method 600 looks for (e.g., attempts to obtain) potentialrevenue maximization opportunity when allocating free resources tovarious customers. It first seeks any opportunity to de-allocateresources, next allocates additional resources to customers whoseservice level metric is outside of the green belt 405 (FIG. 4) andfinally looks for when the customer's inbound traffic must be throttled(reduced) due to exhaustion of free resources or the maximum amount ofresources has been already allocated.

Method 600 begins at step 601. In step 602, the target values(Nt(i),Rt(i)) are computed for every i. Further, the variable“ITC-informed(i)”=“no” is set for all “i”. This variable keeps a recordof whether or not throttling on inbound traffic has been applied or notprior to the current computation. This computation or examination isperformed periodically to check whether or not any service levelagreements have been violated, that is, checking whether or not anyoperation states falls outside of green belts. An examination isconducted in a time interval called a cycle-time. A cycle-time is asystem operation configuration parameter. For example, a cycle timevalue could be selected from a value between 1 second to 60 seconds.Whether to choose a smaller value or a larger value depends on how fastone can adjust resource allocation/de-allocation.

In step 603, it is determined whether or not the service cycle time hasexpired. If it has expired (e.g., a “YES” in step 603), the processloops back to step 602.

If “NO” in step 603, then in step 604 it is checked whether theoperation state M(i) is within the green belt 405 (e.g., see FIG. 4).

If so (e.g., a “YES”), then step 605 is executed in which the systemwaits for the cycle time to elapse and the process loops back to step602.

If “NO” in step 604, then in step 606, it is checked whether anycustomer exists such that the target resource amount Nt(i) is less thanthe current amount N(i) (i.e., seeking an opportunity to de-allocateserver resources from customers and placing them back into the pool of“free” resources).

If “YES” in step 606, one possibility that Nt(i) is less than N(i) isthat because the inbound traffic has been throttled. This condition istested at step 607. Step 606 identifies all those customers such thatNt(i) is less than N(i). Step 607 is applied to only those customersidentified in step 606. Step 607 checks if there is any customer whoseinbound traffic is currently throttled. If step 607 is “YES”, step 609is executed. Step 609 issues a command to ITC 506 to stop applying thethrottling on the i-th customer's inbound traffic. and sets ITC-informed(i)=“no”.

When Nt(i) is less than N(i) (“YES” in step 606) and the inbound trafficis not throttled (“NO” in step 607), that means that too many resourceshave been allocated to the given amount of inbound traffic for the i-thcustomer traffic, step 608 seeks to de-allocate resources away from thei-th customer.

In step 610, it is checked whether the resource(s) must be increased forany customer identified in step 606. There is no action required forthose customers whose target value Nt(i) is equal to the observed valueN(i). Step 610 identifies a customer whose server resource must beincreased.

If so (“YES” in step 610) and if free resources are available (“YES” instep 611), then step 612 is executed to allocate additional resources(e.g., allocate up to Nt(i)-N(i) resources without exceeding Smax#(i)).

When additional resources must be allocated, and yet no free resource isavailable (e.g., a “NO” in step 611), then it is necessary to “re-claim”resources from those customers who have more than the guaranteed minimum(e.g., N(j)>Smin#(j)) (step 614).

When additional resource(s) must be allocated (“YES” in step 610), andno free resource is available (“NO” in step 611) and if the currentlyallocated resource N(i) is more than or equal to the guaranteed minimumSmin#(i) (“NO” in step 613), then the inbound traffic must be throttled(step 615). That is, the inbound traffic controller 506 is instructed tobound the traffic by Rt(i), and ITC-informed(i) is set to “YES”.

As described above, with the unique and unobvious features of thepresent invention, a dynamic resource allocation is provided to aplurality of customers to meet with the (min,max) server resources andperformance metric-based service level agreements.

When describing the embodiment of this invention, often a fixed sizeunit of allocable or de-allocable resources were assumed. However, onecan easily generalize to the case where each allocable unit has adifferent amount.

Further, it is noted that the method of the invention may be stored on astorage medium as a series of program steps, and can be executed by adigital data processing apparatus.

While the invention has been described in terms of a preferredembodiment, the invention is not limited thereto and those skilled inthe art will recognize that the invention can be practiced withmodification within the spirit and scope of the appended claims.

TABLE 1 Smin#(i): the amount of resources guaranteed for the i-thcustomer. This can be a vector. Smax#(i): the maximum amount of serviceresources that could be made available to the i-th customer. This can bea vector. Mbounds(i): the bounds on the service level metric. Each“bounds” consists of a pair, “highbound” and “lowbound.” Ubounds(i): thebound on the utilization of resources allocated to the i-th customerTbounds(i): the bound on the agreed upon average server response timefor the i-th customer T % bounds(i): the bound on the agreed upon serverresponse time percentile for the i-th customer (Smin#(i), Smax#(i),Mbound(i)): the SLA supported by the invention N(i): the number (oramount of) of resources currently allocated to the i-th customer. R(i):the current inbound traffic rate for the i-th customer. This could be avector when more than one type of traffic is defined for each customer.M(i): the current value of the metric M for the i-th customer. Thiscould be a vector. Examples are: U(i): the current utilization of theallocated resources to the i-th customer T(i): currently observed serverresponse time averaging for the l-th customer T % (i): currentlyobserved server response time percentile for the l-th customer Mt(i):the “target” (want to achieve) metric value for the i-th customer. Itsdimension is the same as the dimension of M(i). This is within thedefined “green belt” which is the region within which M(i) is kept.Examples of Mt(i) are: Ut(i): the target resource utilization when M =U, Tt(i): the target average response time when M = T Tt % (i): thetarget percentile response time when M = T %

TABLE 2 For Utilization as Metric: M = U and Mt = Ut The followingrelationships hold among various variables: U(i) = C(i)R(i)/N(i), whereC(i) is a constant Ut(i) = C(i)R(i)/Nt(i), and Ut(i) = C(i)Rt(i)/N(i).From the above and from the given values of N(i), R(i), U(i), and thetarget value Ut(i), Nt(i) and Rt(i) can be computed as follow:  Nt(i) =CEILING [N(i)U(i)/Ut(i)], and  Rt(i) = FLOOR [R(i)Ut(i)/U(i)], whereCEILING gives the smallest integer exceeding and FLOOR gives the largestinteger not exceeding.

TABLE 3 For Average Response Time as Metric: M = T and Mt = Tt S(i):server “service” (or processing) time for the i-th customer, this can becomputed from observing each individual server service time, orestimated from a queueing formula: S(i) is a function of {T(i), R(i),N(i)} If the cluster of servers is modeled by the M/M/m queueing system,S(i) = ((R(i)T(i) + N(i) + p{N(i)}) − SQRT((R(i)T(i) + N(i) +p{N(i)})**2 − 4R(i)T(i)R(i)/2R(i) where p{m} is the probability thatthere are m requests in the i-th customer's server cluster For the M/M/mqueuing model,  Tt(i)~S(i) + p{Nt(i)}S(i)/(Nt(i) − R(i)S(i)) Tt(i)~S(i) + p{N(i)}S(i)/(N(i) − Rt(i)S(i)) Therefore,  Nt(i) = CEILING[R(i)S(i) + p{Nt(i)}S(i)/(Tt(i) − S(i))]  Rt(i) = FLOOR [N(i)/S(i) −p{N(i)}/(Tt(i) − S(i))] where p{m} is the probability that there are mrequests in the customer's server cluster.

TABLE 4 For Percentile Response Time as Metric: M = T % and Mt = Tt % IfT % (i) > T % bound(i), then the average response time T(i) needs to bereduced by (T % (i) − T(i)). Therefore, for T % (i) to approach T %bound, the average response time target Tt(i) becomes:   Tt(i) = T(i) −(T % (i) − T % bound(i)). For the M/M/m queueing model,   Tt(i)~S(i) +p{Nt(i)}S(i)/(Nt(i) − R(i)S(i))   Tt(i)~S(i) + p{N(i)}S(i)/(N(i) −Rt(i)S(i)) and thus,   Nt(i) = CEILING [R(i)S(i) + p{Nt(i)}S(i)/(Tt(i) −S(i))]   Rt(i) = FLOOR [N(i)/S(i) − (p{N(i)}/Tt(i) − S(i))] where p{m}is the probability that there are m requests in the customer's servercluster

TABLE 5 For any given metric M, There are quick simulation tools, quicknumerical computation tools and other approximation formula areavailable in computing Nt(i) and Rt(i) from given (i.e., measured)values of R(i), N(i) and M(i).

1. A method for managing and controlling allocation and de-allocation ofresources based on a guaranteed amount of resource and additionalresources based on a best effort for a plurality of customers, saidmethod comprising: dynamically allocating server resources for aplurality of customers, such that said resources received by a customerare dynamically controlled and said customer receives a guaranteedminimum amount of resources as specified under a service level agreement(SLA), wherein said best effort is defined in said SLA as a range ofservice to be provided to said customer if said server resources arecurrently available; and designating a service level agreement (SLA) ona server resource for a customer as a form (Smin#(i), Smax#(i),Mbounds(i)), where Smin#(i) denotes a guaranteed minimum amount ofserver resources, Smax(i) denotes an upper bound on an amount of serverresources that a customer desires to obtain when free resources areavailable, and Mbounds(i) that includes a low bound (Mlowbound(i)) and ahigh bound (Mhighbound(i)) designating bounds on a service level metricfor allocating resources beyond the minimum amount Smin#(i) for eachi-th customer.
 2. The method according to claim 1, further comprising:utilizing a performance metric to increase or decease an inbound trafficto a customer.
 3. The method according to claim 1, further comprising:supporting minimum and maximum server resource-based service levelagreements for a plurality of customers.
 4. The method according toclaim 1, further comprising: utilizing performance metrics to controlthe allocation of additional server resources to a plurality ofcustomers using bounds on given service level metrics.
 5. The methodaccording to claim 1, further comprising: supporting a plurality ofservice level metrics.
 6. The method according to claim 1, furthercomprising: selectively utilizing a plurality of different metrics for aplurality of different customers.
 7. The method according to claim 1,further comprising: utilizing a service level metric, an amount ofallocable resources, and an inbound traffic rate, for defining a stateof a current service level (M,N,R) for each customer.
 8. The methodaccording to claim 1, further comprising: utilizing a target servicelevel metric Mt to maintain an actual service level M substantially ator near a target service level so as to be guaranteed to fall betweenlow and high bounds (Mlowbound and Mhighbound) specified in a servicelevel agreement (SLA).
 9. The method according to claim 1, furthercomprising: computing a target amount of resources Nt and an inboundtraffic rate Rt from a given target service level metric Mt and (M,N,R).10. The method according to claim 1, further comprising: performing atleast one of a numerical analysis, a mathematical formulaic operation,an add-one/subtract-one, and a quick simulation for deriving a targetamount of resources Nt and an inbound traffic rate Rt.
 11. The methodaccording to claim 1, further comprising: supporting a resourceutilization U for an actual service level M, average response time T foran actual service level M, and a response time percentile T % for anactual service level M, thereby to support targets of Ut, Tt and Tt %.12. The method according to claim 1, further comprising: decidingwhether or not to add a server resource or to reduce an inbound trafficrate to meet service level agreements for a plurality of customers. 13.The method according to claim 1, further comprising: providing a serverfarm including means for dynamically allocating servers or serverresources to customers as demands of said customers change.
 14. Themethod according to claim 1, wherein a minimum amount of serverresources Smin#(i) comprises a guaranteed amount of server resourcesthat the i-th customer will receive regardless of the server resourceusage, and wherein a maximum amount of server resources Smax#(i)comprises the upper bound on the amount of server resources that thei-th customer may receive beyond the minimum amount provided that someunused server resources are available for allocation.
 15. The methodaccording to claim 14, wherein a range between Smin#(i) and Smax#(i)represents server resources that are provided on an as-available basis,such that the customer is not guaranteed to obtain these resources atany one time, if at all.
 16. A method for managing and controllingallocation and de-allocation of resources based on a guaranteed amountof resource and additional resources based on a best effort for aplurality of customers, said method comprising: dynamically allocatingserver resources for a plurality of customers, such that said resourcesreceived by a customer are dynamically controlled and said customerreceives a guaranteed minimum amount of resources as specified under aservice level agreement (SLA), wherein said best effort is defined insaid SLA as a range of service to be provided to said customer if saidserver resources are currently available, wherein an allocation of anadditional resource is performed so as to keep the performance metricwithin Mbounds(i), and wherein said Mbounds(i) includes any one ofbounds on the server resource utilization that are denoted byUbounds(i), bounds on the average server response time that are denotedby Tbounds(i), and bounds on the server response time percentile thatare denoted by T%bounds(i).
 17. A method for managing and controllingallocation and de-allocation of resources based on a guaranteed amountof resource and additional resources based on a best effort for aplurality of customers, said method comprising: dynamically allocatingserver resources for a plurality of customers, such that said resourcesreceived by a customer are dynamically controlled and said customerreceives a guaranteed minimum amount of resources as specified under aservice level agreement (SLA), wherein said best effort is defined insaid SLA as a range of service to be provided to said customer if saidserver resources are currently available; and when a server resourceutilization goes above a predetermined set limit Mhighbound(i),attempting, by a server farm, to maintain the utilization between saidpredetermined set limits Mbounds(i) by allocating additional serverresources to the i-th customer when free resources are available. 18.The method according to claim 17, further comprising: if free resourcesare not available, then limiting, by the server farm, an amount ofincoming traffic to the i-th customer's server.
 19. The method accordingto claim 1, further comprising: controlling a dynamic resourceallocation to said plurality of customers to meet a value between theminimum and maximum server resources and performance metric-basedservice level agreements.
 20. A method for managing and controllingallocation and de-allocation of resources based on a guaranteed amountof resource and additional resources based on a best effort for aplurality of customers, said method comprising: dynamically allocatingserver resources for a plurality of customers, such that said resourcesreceived by a customer are dynamically controlled and said customerreceives a guaranteed minimum amount of resources as specified under aservice level agreement (SLA), wherein said best effort is defined insaid SLA as a range of service to be provided to said customer if saidserver resources are currently available; monitoring an inbound trafficrate R(i), a currently assigned amount of server resources N(i), and acurrent service level metric M(i) for all of said plurality ofcustomers. computing a target amount of server resources Nt(i), withoutchanging an inbound traffic R(i); and computing a target inbound trafficrate Rt(i), without changing an allocated resource N(i), to bring theservice level metric M(i) to the targeted service level metric Mt(i)from monitored R(i), N(i) and M(i) for all i, wherein the target servicelevel metric Mt(i) comprises the service level metric substantially ator near where M(i) is to be maintained, and bounded by Mbounds(i). 21.The method according to claim 20, further comprising: determining how toadjust a current M(i) to the target Mt(i), by one of changing N(i) toNt(i) and by bounding the inbound traffic rate R(i) to Rt(i).
 22. Themethod according to claim 21, further comprising: requesting a systemresource manager to perform the resource allocation.
 23. The methodaccording to claim 22, further comprising: requesting an inbound trafficcontroller to throttle an amount of inbound traffic to the plurality ofcustomers.
 24. The method according to claim 1, further comprising:maximizing revenue potential when allocating resources beyond a minimumamount for a customer.
 25. The method according to claim 1, wherein aunit of said resources comprises a fixed size unit of allocable orde-allocable resources.
 26. The method according to claim 1, wherein aunit of each allocable resource has a different amount.
 27. A method ofdeciding server resource allocation for a plurality of customers, saidmethod comprising: computing target values (Nt(i),Rt(i)) for everycustomer i and setting a variable “ITC-informed(i)”=“no” for allcustomers “i” such that a record is kept of whether or not throttling oninbound traffic is being applied or not during a given service cycletime; determining whether or not the service cycle time has expired; ifthe service cycle time has not expired, then checking whether anoperation state M(i) is within a predetermined area defined by a metricand a number of resources; if the operation state is not within thepredetermined area, then checking whether any customer exists such thata target resource amount Nt(i) is less than a current resource amountN(i); if Nt(i) is less than N(i), then determining whether the inboundtraffic has been throttled, by determining whether, for any “i”,ITC-informed(i) =“yes”; and if the inbound traffic has been throttled,then removing the throttling by directing an inbound traffic controllerto stop throttling i-th traffic class and setting ITC-informed (i)=“no”,wherein said target values (Nt(i),Rt(i)) comprise parameters containedin a Service Level Agreement (SLA) for said customer i as related to abest effort basis for managing and controlling allocation andde-allocation of resources to said customer i, and said best effort isdefined in said SLA as a range of service to be provided to saidcustomer i if said server resources are currently available.
 28. Themethod according to claim 27, further comprising: when Nt(i) is lessthan N(i) and it is determined that the inbound traffic is notthrottled, deallocating resources from said customers.
 29. The methodaccording to claim 28, further comprising: determining whether theresources must be increased by selecting any i and determining whetherNt(i) is greater than N(i).
 30. The method according to claim 29,further comprising: if it is determined that Nt(i) is greater than N(i)and if free resources are judged to be available, then allocatingadditional resources up to Nt(i)-N(i) resources without exceeding amaximum amount of server resources Smax#(i)).
 31. The method accordingto claim 29, further comprising: if it is determined that Nt(i) isgreater than N(i) and if free resources are judged to be unavailable andif the currently allocated resource N(i) is less than the guaranteedminimum Smin#(i), then reclaiming resources from those customers jhaving more than a guaranteed minimum such that N(j)>Smin#(j).
 32. Themethod according to claim 29, further comprising: if it is determinedthat Nt(i) is greater than N(i) and if free resources are judged to beunavailable and if the currently allocated resource N(i) is more than orequal to the guaranteed minimum Smin#(i), then throttling the inboundtraffic.
 33. The method according to claim 32, further comprising:bounding, by the inbound traffic controller, the traffic by Rt(i). 34.The method according to claim 27, further comprising: searching for apotential revenue maximization opportunity when allocating freeresources to various customers.
 35. The method according to claim 34,further comprising: first seeking to de-allocate resources, thenallocating additional resources to customers whose service level metricis outside of a predetermined area, and thirdly searching for when thecustomer's inbound traffic must be throttled due to exhaustion of freeresources or the maximum amount of resources has been already allocated.36. A system for managing and controlling allocation and de-allocationof resources based on a guaranteed amount of resources and additionalresources based on a best effort for a plurality of customers, saidsystem comprising: plurality of servers; and a resource allocationdevice for dynamically allocating server resources for a plurality ofcustomers, such that said resources received by a customer aredynamically controlled and said customer receives a guaranteed minimumamount of resources as specified under a best effort agreement in aservice level agreement (SLA) with said customer, wherein said besteffort is defined in said SLA as a range of service to be provided tosaid customer if said server resources are currently available.